Linguistic Knowledge Based Supervised Key - phrase Extraction

نویسندگان

  • Tanmoy Pal
  • Haider Banka
  • Barnan Das
چکیده

The most important information about the content of a document is represented by the key phrases of that document. In this study an automatic key phrase extraction algorithm is devised using machine learning technique. The proposed method not only considers the document level statistics like TFxIDF, the linguistic features of the phrases are also incorporated. Experiment has been performed on Naïve Bayes’ classifier, J48 Decision Tree, and IBk lazy learner to choose the most suitable learning model. The imbalanced class distribution problem is resolved by over sampling on the minority class samples synthetically. The experimental result reveals the accuracy and efficiency of the proposed technique.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Semi-Supervised Key Phrase Extraction Approach: Learning from Title Phrases through a Document Semantic Network

It is a fundamental and important task to extract key phrases from documents. Generally, phrases in a document are not independent in delivering the content of the document. In order to capture and make better use of their relationships in key phrase extraction, we suggest exploring the Wikipedia knowledge to model a document as a semantic network, where both n-ary and binary relationships amon...

متن کامل

Accurate Keyphrase Extraction from Scientific Papers by Mining Linguistic Information

In this paper we investigate the impact of candidate terms filtering using linguistic information on the accuracy of automatic keyphrase extraction from scientific papers. According to linguistic knowledge, the noun phrases are most likely to be keyphrases. However the definition of a noun phrase can vary from a system to another. We have identified five POS tag sequence definitions of a noun p...

متن کامل

Combining Supervised Learning Techniques to Key-Phrase Extraction for Biomedical Full-Text

Key-phrase extraction plays a useful a role in research areas of Information Systems (IS) like digital libraries. Short metadata like key phrases are beneficial for searchers to understand the concepts found in the documents. This paper evaluates the effectiveness of different supervised learning techniques on biomedical full-text: Sequential Minimal Optimization (SMO) and K-Nearest Neighbor, b...

متن کامل

Automatically Labeled Data Generation for Large Scale Event Extraction

Modern models of event extraction for tasks like ACE are based on supervised learning of events from small hand-labeled data. However, hand-labeled training data is expensive to produce, in low coverage of event types, and limited in size, which makes supervised methods hard to extract large scale of events for knowledge base population. To solve the data labeling problem, we propose to automat...

متن کامل

Chinese Common Noun Phrase Resolution: An Unsupervised Probabilistic Model Rivaling Supervised Resolvers

Pronoun resolution and common noun phrase resolution are the two most challenging subtasks of coreference resolution. While a lot of work has focused on pronoun resolution, common noun phrase resolution has almost always been tackled in the context of the larger coreference resolution task. In fact, to our knowledge, there has been no attempt to address Chinese common noun phrase resolution as ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2011